library(tidyverse)
library(broom)
library(ggfortify)
library(jsonlite)
library(knitr)
library(mblm)
serversFromConfig <- function(configFile = "../config.json") {
jsonlite::fromJSON(configFile) |>
as_tibble() |>
select(contains("dl")) |>
mutate(server = str_c("Server ", 1:3), .before = 1) |>
rename_with(\(x) str_remove(x, "_dl_servers"), !server) |>
pivot_longer(!server, names_to = "storage", values_to = "ip") |>
mutate(storage = case_match(
storage,
"swarm" ~ "Swarm",
"ipfs" ~ "IPFS",
"arw" ~ "Arweave"
))
}
dataFromJsonRaw <- function(jsonFile = "../results.json") {
jsonlite::fromJSON(jsonFile) |>
as_tibble() |>
unnest(tests) |>
unnest(results)
}
dataFromJson <- function(jsonFile = "../results.json") {
dataFromJsonRaw(jsonFile) |>
mutate(sha256_match = (sha256_match == "true")) |>
mutate(storage = ifelse(storage == "Ipfs", "IPFS", storage)) |>
rename(time_sec = download_time_seconds) |>
mutate(size_kb = as.integer(size)) |>
select(!size & !server & !timestamp) |>
left_join(serversFromConfig(), by = join_by(storage, ip)) |>
relocate(size_kb, server, time_sec, attempts, sha256_match, .after = storage)
}Analysis of first run of the benchmarking experiment
1 Loading and tidying the data
We first set up some functions to load and tidy the raw data:
After loading and tidying the data, here’s what the first few rows of the table look like:
dat <- dataFromJson()
dat |>
head(n = 10) |>
kable()| storage | size_kb | server | time_sec | attempts | sha256_match | ip | latitude | longitude |
|---|---|---|---|---|---|---|---|---|
| Swarm | 1 | Server 1 | 157.7660 | 1 | TRUE | download.gateway.ethswarm.org | 50.4779 | 12.3713 |
| Swarm | 1 | Server 2 | 246.9573 | 1 | TRUE | 188.245.154.61:1633 | 49.4542 | 11.0775 |
| Swarm | 1 | Server 3 | 157.5822 | 1 | TRUE | 188.245.177.151:1633 | 49.4542 | 11.0775 |
| Swarm | 1 | Server 1 | 157.6129 | 1 | TRUE | download.gateway.ethswarm.org | 50.4779 | 12.3713 |
| Swarm | 1 | Server 2 | 220.8360 | 1 | TRUE | 188.245.154.61:1633 | 49.4542 | 11.0775 |
| Swarm | 1 | Server 3 | 157.0362 | 1 | TRUE | 188.245.177.151:1633 | 49.4542 | 11.0775 |
| Swarm | 1 | Server 1 | 163.7713 | 1 | TRUE | download.gateway.ethswarm.org | 50.4779 | 12.3713 |
| Swarm | 1 | Server 2 | 233.7285 | 1 | TRUE | 188.245.154.61:1633 | 49.4542 | 11.0775 |
| Swarm | 1 | Server 3 | 157.0660 | 1 | TRUE | 188.245.177.151:1633 | 49.4542 | 11.0775 |
| Swarm | 1 | Server 1 | 159.3709 | 1 | TRUE | download.gateway.ethswarm.org | 50.4779 | 12.3713 |
We can do some sanity checks. First of all, every download succeeded:
dat |>
count(sha256_match) |>
kable()| sha256_match | n |
|---|---|
| TRUE | 1350 |
And the experiment is well balanced, with 30 replicates per size, server, and platform:
dat |>
count(size_kb, server, storage) |>
kable()| size_kb | server | storage | n |
|---|---|---|---|
| 1 | Server 1 | Arweave | 30 |
| 1 | Server 1 | IPFS | 30 |
| 1 | Server 1 | Swarm | 30 |
| 1 | Server 2 | Arweave | 30 |
| 1 | Server 2 | IPFS | 30 |
| 1 | Server 2 | Swarm | 30 |
| 1 | Server 3 | Arweave | 30 |
| 1 | Server 3 | IPFS | 30 |
| 1 | Server 3 | Swarm | 30 |
| 10 | Server 1 | Arweave | 30 |
| 10 | Server 1 | IPFS | 30 |
| 10 | Server 1 | Swarm | 30 |
| 10 | Server 2 | Arweave | 30 |
| 10 | Server 2 | IPFS | 30 |
| 10 | Server 2 | Swarm | 30 |
| 10 | Server 3 | Arweave | 30 |
| 10 | Server 3 | IPFS | 30 |
| 10 | Server 3 | Swarm | 30 |
| 100 | Server 1 | Arweave | 30 |
| 100 | Server 1 | IPFS | 30 |
| 100 | Server 1 | Swarm | 30 |
| 100 | Server 2 | Arweave | 30 |
| 100 | Server 2 | IPFS | 30 |
| 100 | Server 2 | Swarm | 30 |
| 100 | Server 3 | Arweave | 30 |
| 100 | Server 3 | IPFS | 30 |
| 100 | Server 3 | Swarm | 30 |
| 1000 | Server 1 | Arweave | 30 |
| 1000 | Server 1 | IPFS | 30 |
| 1000 | Server 1 | Swarm | 30 |
| 1000 | Server 2 | Arweave | 30 |
| 1000 | Server 2 | IPFS | 30 |
| 1000 | Server 2 | Swarm | 30 |
| 1000 | Server 3 | Arweave | 30 |
| 1000 | Server 3 | IPFS | 30 |
| 1000 | Server 3 | Swarm | 30 |
| 10000 | Server 1 | Arweave | 30 |
| 10000 | Server 1 | IPFS | 30 |
| 10000 | Server 1 | Swarm | 30 |
| 10000 | Server 2 | Arweave | 30 |
| 10000 | Server 2 | IPFS | 30 |
| 10000 | Server 2 | Swarm | 30 |
| 10000 | Server 3 | Arweave | 30 |
| 10000 | Server 3 | IPFS | 30 |
| 10000 | Server 3 | Swarm | 30 |
Furthermore, most downloads succeeded in a single attempt, with only a few instances on Arweave where two download attempts were needed instead of one:
dat |>
count(storage, attempts) |>
kable()| storage | attempts | n |
|---|---|---|
| Arweave | 1 | 447 |
| Arweave | 2 | 3 |
| IPFS | 1 | 450 |
| Swarm | 1 | 450 |
2 Preliminary analysis
Plotting the raw results, we get:
dat |>
select(storage | size_kb | server | time_sec) |>
mutate(storage = fct_reorder(storage, time_sec)) |>
mutate(size = case_when(
size_kb == 1 ~ "1 KB",
size_kb == 10 ~ "10 KB",
size_kb == 100 ~ "100 KB",
size_kb == 1000 ~ "1 MB",
size_kb == 10000 ~ "10 MB"
)) |>
mutate(size = fct_reorder(size, size_kb)) |>
ggplot(aes(x = time_sec, color = storage, fill = storage)) +
geom_density(alpha = 0.2, bw = 0.05) +
scale_x_log10(breaks = c(10, 60, 360), labels = c("10s", "1m", "6m")) +
labs(x = "Retrieval time", y = "Density",
color = "Platform: ", fill = "Platform: ") +
scale_color_manual(values = c("steelblue", "goldenrod", "forestgreen")) +
scale_fill_manual(values = c("steelblue", "goldenrod", "forestgreen")) +
facet_grid(server ~ size, scales = "fixed") +
theme_bw() +
theme(legend.position = "bottom", panel.grid = element_blank())Here we have retrieval times (on the log scale) along the x-axis and density of incidence along the y-axis. The curves are higher where there are more data. Colors represent the different storage platforms; facet rows are the different servers used, and facet columns are the various data sizes.
At a glance, we see that IPFS is the fastest. For small files, Swarm is faster than Arweave. For larger files, it is a bit slower but still comparable.
What is strange is that there appears to be an “anti-pattern” whereby for IPFS, larger files lead to shorter retrieval times. Let us look at this more closely, and for all three platforms:
dat |>
mutate(storage = fct_relevel(storage, "Swarm", "IPFS", "Arweave")) |>
ggplot(aes(x = size_kb, y = time_sec)) +
geom_point(color = "steelblue", alpha = 0.5) +
geom_smooth(method = lm, color = "goldenrod", fill = "goldenrod") +
scale_x_log10() +
labs(x = "File size (KB)", y = "Download time (seconds)") +
facet_grid(server ~ storage) +
theme_bw()We see that for both IPFS and Arweave, larger files lead to shorter download times. For Arweave and Server 1, this pattern appears reversed, but that is due to the outliers in the largest size category distorting the ordinary least-squares fit. Indeed, a median-based (Theil–Sen) regression detects a decreasing trend:
dat |>
mutate(storage = fct_relevel(storage, "Swarm", "IPFS", "Arweave")) |>
ggplot(aes(x = size_kb, y = time_sec)) +
geom_point(color = "steelblue", alpha = 0.5) +
geom_smooth(method = \(formula, data, weights) mblm(formula, data),
color = "goldenrod", fill = "goldenrod") +
scale_x_log10() +
labs(x = "File size (KB)", y = "Download time (seconds)") +
facet_grid(server ~ storage) +
theme_bw()An overall increasing trend is only seen for Swarm, but there the relationship between file size and download time is clearly nonlinear: times initially stagnate or even decrease slightly, before taking off again.
Otherwise, all fitted slopes are deemed to be very unlikely to be due to pure chance, as the p-values below show:
regressionDat <- dat |>
mutate(size = log10(size_kb)) |>
nest(data = !storage & !server) |>
mutate(fit = map(data, \(dat) lm(time_sec ~ size, data = dat))) |>
mutate(regtab = map(fit, broom::tidy)) |>
unnest(regtab)
regressionDat |>
select(!data & !fit) |>
filter(term != "(Intercept)") |>
kable()| storage | server | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|---|
| Swarm | Server 1 | size | 52.29077 | 5.1876652 | 10.079827 | 0 |
| Swarm | Server 2 | size | 53.74020 | 3.3197602 | 16.187977 | 0 |
| Swarm | Server 3 | size | 55.93188 | 3.6941106 | 15.140826 | 0 |
| IPFS | Server 1 | size | -12.92478 | 0.2866705 | -45.085838 | 0 |
| IPFS | Server 2 | size | -10.67000 | 0.3773290 | -28.277707 | 0 |
| IPFS | Server 3 | size | -13.47224 | 0.3235440 | -41.639594 | 0 |
| Arweave | Server 1 | size | 40.63760 | 5.5333705 | 7.344096 | 0 |
| Arweave | Server 2 | size | -15.68051 | 0.2631899 | -59.578667 | 0 |
| Arweave | Server 3 | size | -15.67807 | 0.2632833 | -59.548311 | 0 |
However, the assumptions behind linear regression do not hold well for Swarm and for Arweave under Server 1:
regressionDat |>
filter(term != "(Intercept)") |>
mutate(diagnostics = map(fit, \(x) {
autoplot(x, smooth.colour = NA, alpha = 0.3, colour = "steelblue") +
theme_bw()
} )) |>
mutate(diagnostics = pmap(list(diagnostics, storage, server), \(dia, sto, se) {
gridExtra::grid.arrange(grobs = dia@plots, top = str_c(sto, ", ", se))
} )) |>
suppressMessages() |>
capture.output() |>
invisible()For this reason, let us re-generate the regression tables, but using Theil–Sen linear regression instead. The results are comparable, except that the slope for (Arweave, Server 1) is reversed:
dat |>
mutate(size = log10(size_kb)) |>
nest(data = !storage & !server) |>
mutate(fit = map(data, \(dat) mblm(time_sec ~ size, dataframe = dat))) |>
mutate(regtab = map(fit, broom::tidy)) |>
unnest(regtab) |>
select(!data & !fit) |>
filter(term != "(Intercept)") |>
mutate(p.value = round(p.value, 5)) |>
kable()| storage | server | term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|---|---|
| Swarm | Server 1 | size | 12.52489 | 10.005017 | 10887 | 0.00000 |
| Swarm | Server 2 | size | 38.18759 | 22.489381 | 11308 | 0.00000 |
| Swarm | Server 3 | size | 25.53666 | 11.496877 | 11295 | 0.00000 |
| IPFS | Server 1 | size | -13.15176 | 1.562340 | 0 | 0.00000 |
| IPFS | Server 2 | size | -11.46449 | 2.503900 | 0 | 0.00000 |
| IPFS | Server 3 | size | -13.46260 | 1.729444 | 0 | 0.00000 |
| Arweave | Server 1 | size | -13.09084 | 2.669667 | 3822 | 0.00056 |
| Arweave | Server 2 | size | -15.65500 | 1.436399 | 0 | 0.00000 |
| Arweave | Server 3 | size | -15.65599 | 1.428417 | 0 | 0.00000 |